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! ! Abstract 

Bayesian inference is applied directly to the problem of unfolding. The outcome is a posterior 
1 probability density for the spectrum before smearing, defined in the multi-dimensional space of 

all possible spectra. Regularization consists in choosing a non-constant prior. Despite some 
similarity, the fully bayesian unfolding (FBU) method, presented here, should not be confused 
. with D'Agostini's iterative method [1]. 
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1 Introduction 

Unfolding, in the context of this document, 
is the non-parametric inference of a binned 
histogram. An overview of unfolding, geared 
towards High Energy Physics (HEP), is in 
Ref. [2], and Ref. [3] contains details on pub- 
lished examples of unfolding from ATLAS. 

The motivation for this study is to un- 
derstand unfolding from first principles. This 
sheds light on characteristics common to other 
unfolding methods too. 

The fully bayesian unfolding (FBU) offers 
a complete solution to unfolding, which can be 
expressed analytically through Bayes' theorem 
(Sec. 2) and computed numerically (Sec. 5.2). 
When dealing with Poisson-distributed data, 
the answer of FBU is valid even in regions 
of low statistics where Poisson probability 
is not approximated well by a Gaussian, or 
when the answer is known to respect bound- 
aries, e.g. Poisson expectation values be pos- 
itive, or when regularization conditions dis- 

1 The so-called "convergence criterion" in iterative 
after a reasonable number of iterations. If the goal was 



tort the posterior's shape. Such examples are 
shown. Here, FBU is formulated for Poisson- 
distributed data, but an obvious modification 
in Eq. 3 would enable it to use non-Poisson 
data. 

FBU differs from D'Agostini's iterative un- 
folding [1, 4], despite both using Bayes' the- 
orem. In FBU the answer is not an estima- 
tor and its covariance matrix, but a posterior 
probability density defined in the space of pos- 
sible spectra. FBU does not involve iterations, 
thus does not depend on a convergence cri- 
terion 1 , nor on the first point of an iterative 
procedure, which in [1] is named "prior". If 
more than one answers are equally likely, as 
can happen when the reconstructed spectrum 
has fewer bins than the inferred one, then FBU 
reveals all of them, while iterative unfolding 
converges towards some of the possible solu- 
tions. Regularization (see Sec. 7) is not done 
by interrupting iterations, but by choosing a 
prior which favors certain characteristics, such 

ing actually is there to prevent convergence, but only 
-gence, then extra iterations would only make it better. 
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as smoothness. Thus, FBU offers intuition and 
fuil controi of the regularizing condition, which 
makes the answer easy to interpret. 

FBU differs significantly also from SVD un- 
folding [5]. In FBU the migrations matrix (de- 
fined in Sec. 1.1) is not distorted by singular 
value decomposition, therefore FBU assumes 
the intended migrations model. The answer of 
FBU is not an estimator plus covariance ma- 
trix, but a probability density function which 
does not have to be Gaussian, which is impor- 
tant especially in bins with small Poisson event 
counts. FBU does not involve matrix inversion 
and computation of eigenvalues, which makes 
it more stable numerically. Finally, SVD im- 
poses curvature regularization (see S2 (T) in 
Sec. 7), while FBU offers the freedom to use 
different regularization choices. This freedom 
becomes necessary when the correct answer ac- 
tually has large curvature, or when the answer 
has only two bins, thus curvature is not even 
defined. 

1.1 Nomenclature 

The following definitions are used: 

Spectrum: A binned histogram showing the 
distribution of entries ( "events" ) in some 
observable quantity, m. 

Smearing: Any stochastic effect which re- 
sults in classifying (or "reconstructing") 
events in a wrong bin, i.e., in a bin other 
than where they would be if the true 
value of m was always reconstructed. 

Truth: A "truth-level" (or "truth") spec- 
trum, contains in each bin (t G 
{1, 2, . . . , Nt}) a number Tt G M, which is 
the number of events expected to be pro- 
duced in that bin, before reconstruction. 
A truth spectrum is represented by a Nf- 
tuple T = (T% , T2, . . . , Tjv ( ) , correspond- 
ing to a point in a A^-dimensional space. 
Two symbols are reserved for two special 



T-points: The truth-level spectrum from 
which the data (see below) actually orig- 
inate, i.e., nature's truth- level spectrum 
is T. The truth-level spectrum followed 
by the MC events that populate the mi- 
grations matrix (see below) is T. In the 
ideal case where MC reproduces reality, 
then T = T. 

Reco: A "reco-level" (or "reco") spectrum, 
contains in each bin (r G {1, 2, . . . , N r }) 
a number R r G K, which is the number 
of events expected to be reconstructed in 
that bin. A reco spectrum is represented 
by a A^-tuple R = (R\,R,2, ■ . . ,Rn t )- 

Data: A spectrum where each bin (r G 
{1, 2, . . . , N r }) contains a number D r G 
N, which is the number of events observed 
in that bin, after smearing obviously. A 
data spectrum is represented by a N r - 
tuple D = (D 1 ,D 2 ,...D Nr ). Without 
loss of generality, and since this is the 
most common use case in HEP, it is as- 
sumed that D r follows a Poisson distri- 
bution of mean R r . 

Migrations matrix: A matrix Ai whose ele- 
ment Aitr is the joint probability P(t, r) 
of an event to be produced in the truth- 
level bin t and reconstructed in the reco- 
level bin r. 

Response matrix: A matrix V whose ele- 
ment Vtr is the conditional probability 
P{r\t) for an event that was produced in 
the truth-level bin t to be reconstructed 
in the reco-level bin r. 

Unfolded: An unfolded spectrum helps visu- 
alize, at least in a limited way, the re- 
sult of FBU. Each bin (t 6 {1,2, ... , N t }) 
contains a number Ut G M, which is the 
estimated value of the actual truth-level 
Tt- The error bars in each bin show 
the shortest interval where Tt is inferred 
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to be with probability 2 68%. Let this 
interval be denoted by [U t ,U t ]. More 
details in Sec. 5.3. An unfolded spec- 
trum is represented by a iV t -tuple U = 
(Ux,U 2 ,... : U Nt ). 



2 Formulation 

In unfolding, the question is: 

"Given the data (D) and the mi- 
grations model, what was the ac- 
tual truth-level spectrum (T) 



This is a quintessentially bayesian question, 
since the true value of an unknown is asked. 
It is answered by Bayes' theorem: 



p(T\D,M) = L(D\T,M)- 



7T(T,M) 

Norm. Const. 



(1) 



If the migrations model, represented here by 
Ai, is not uncertain, it can be omitted to sim- 
plify the notation. So, 



p(T|D) oc L(D|T) -vr(T). 



(2) 



To each point T in the space of truth 
spectra corresponds some probability density, 
p(T|D), that T be the correct truth- level spec- 
trum T. This probability density depends (i) 
on the observation (D), (ii) on the smearing 
model encoded in Ai, (iii) on the probability 
model followed by the data, e.g., Poisson, and 
(iv) on the prior probability density vr(T). 

Assuming that the data follow Poisson 
statistics, the likelihood is: 



N r 



^it) = n 



— Rr 



where 



Rr 



N t 
i=l 



D r \ 



P(r\t). 



(3) 



(4) 



P{r\t) is the probability to be reconstructed 
in bin r, given that the truth-level bin (before 
smearing) is t. This is extracted from Ai: 



P(r\t) 



P(t,r) 
P(t) 



M 



tr 



-1 sr^Nr 



X M t k 



(5) 



where et is the efficiency of row t of Ai. This 
is the probability that an event produced in 
truth-level bin t will be reconstructed in one 
of the iV r reco- level bins included in Ai. So, 



P(t) 



P(t) 



(6) 



Often the data include contamination from 
background processes, such as noise, or "fakes" 
(to use an example from HEP), namely events 
which are in D but don't originate from T at 
truth-level. The total background has to be 
taken into account in the likelihood. The only 
change, in this case, is that Eq. 4 should be- 
come 



Nt 



Rr 



B r + J2 T fP(r\t), 



(7) 



t=i 



where B r is the expected number of back- 
ground events in bin r. In matrix notation, 



R 



B + V T T. 



(8) 



This lays out the fundamentals of FBU. 
The solution is written down analytically in 
Eq. 2, and much of the rest of this document 
deals with its computation. 

3 Conceptualization 

A pervasive problem in unfolding is that the 
maximum likelihood estimator (MLE) of the 
truth-level spectrum, which is unbiased and 
given by (P T ) _1 D when V is invertible, suf- 
fers from great variance. Due to this, the MLE 
often looks unnatural, with bin contents that 



2 Since FBU uses Bayes' theorem, these error bars define an interval that integrates 68% probability, or "credibil- 
ity" . This percentage should not be confused with frequentist coverage, and the bayesian credibility intervals should 
not be confused with confidence intervals. 
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vary wildly in alternating directions from bin 
to bin, like saw teeth. 

Regularization is introduced in order to 
suppress this variance. The classical descrip- 
tion of regularization is given beautifully by 
Cowan in [2]: The MLE is the unbiased esti- 
mator with the smallest possible variance, even 
though the latter is huge. In order to avoid this 
variance it is necessary to introduce bias to the 
estimator. 

To introduce bias, instead of maximizing 
L(D|T), the estimator is obtained by maxi- 
mizing: 

L(T) = L(D|T)-e- a ' s(T) , (9) 

with a being a positive regularization parame- 
ter, and S(T) an arbitrary regularization func- 
tion that increases with some undesired prop- 
erty, such as "non-smoothness". The result- 
ing estimator is a compromise between high 
likelihood, which is desirable, and large 5(T), 
which is not. 

The mode of L(T) is obviously not the 
MLE. It is, however, the mode of the FBU 
posterior p(T|D), since L(T) oc p(T|D). The 
latter becomes obvious if the prior in Eq. 2 is 
rewritten as 

tt(T) = e- a - s(T) . (10) 

Even without regularization, i.e., when S(T) 
is constant, or a = 0, the classical estimator 
is still the mode of the posterior, which then 
coincides with the MLE. (See [6], Sec. 6.13). 

The classical estimator is always (with or 
without regularization) , the mode of the FBU 
posterior p(T|D), after assuming the prior 
which corresponds to the same regularization 
(Eq. 10). 

Without regularization, i.e., with a con- 
stant vr(T), what classically is described as 
large variance of the MLE appears in FBU as 
large spread in p(T|D), which is oc L(D|T). 
So, with FBU it is obvious why regularization 

3 With the exception of SVD, where distorting the r 



may be desired: if the prior is constant the 
posterior may be too wide, therefore very dif- 
ferent T's may be almost equally likely. The 
variance of the classical estimator that maxi- 
mizes L(T), and the spread of p(T|D) in FBU, 
are the two faces of the same coin. 

The regularization term e - a ' s ( T ) ' m Eq. 9 
reduces the variance of the classical estimator, 
just like the prior vr(T) = e -a ' s ( T ) reduces the 
spread of p(T|D) in FBU. 

The term e ~ a ' s ( T ) ' m Eq. 9 is, classically, 
just a way to introduce bias to the classical es- 
timator of T. But at the same time it is the 
prior in the corresponding FBU. The classi- 
cal estimator can be understood half-done 
FBU, where, instead of the full p(T|D), only 
its mode is computed. Furthermore, instead 
of computing directly the probability of each 
T to be the actual T, the classical procedure 
estimates the variance of the mode of p(T|D) 
in pseudo-experiments where D is substituted 
with pseudo-data sampled from Tmle (since 
T is unknown). The initial question, however, 
was not how much the mode of the posterior 
would vary in pseudo-experiments, but how 
likely each T was to be T. The variance of 
the posterior and the variance of its mode in 
pseudo-experiments are clearly related, but are 
not the same thing. 

Seeing regularization choice of prior 
7r(T) is natural. Even classical regularization 
is nothing but an a-priori belief in the smooth- 
ness (or other property) of T. That's why bias 
was not introduced by some absurd manip- 
ulation 3 , but through a physically motivated 
5(T). The prior is conceptually the host of 
any such prior beliefs. 



4 Generation of MC events 

To investigate FBU, Monte Carlo (MC) events 
are generated and smeared. They are gener- 

e matrix seems physically unmotivated. 
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ated following 



dN 
dm 



1 



m 
7000 



/ 



m \4.8 
700o) ' (U » 



shown in Fig. 1(a), which is inspired from the 
distribution of dijet masses in proton collisions 
at energy yfs = 7000 GeV. 

The observable m is binned. The delimiters 
of 14 truth-level m bins are set at 



500 • e" - 



n-0.15 



for < n < 14. 



(12) 



For simplicity, the same bins are defined in the 
reco-level m, although the formulation of Sec. 2 
is so general that the truth and reco bins don't 
need to share any common properties. This 
will be demonstrated in Sec. 6.5. 

Each MC event, generated at m = mtruth, 
is smeared and reconstructed at m = m Teco , 
where 

m Teco = mt m th + 5m, (13) 

where 5m is a random variable following a 
Gaussian distribution of mean and standard 
deviation 



O-(THruth) = ^truth 



V 7 m truth 



+ b), (14) 



which is a simplified parametrization of the en- 
ergy resolution of a calorimeter. 

An example of the resulting M, for a = 0.5 
and b = 0.1, is in Fig. 2(a). The corresponding 
efficiencies (Eq. 6), are in Fig. 2(c). Fig. 2(b) 
shows the corresponding response matrix V 
(see Sec. 1.1 and Eq. 5). Fig. 1(b) shows the 
truth, reco, and data spectra. 

The migrations matrix M in Fig. 2(a) is 
ideal because (i) it reflects the smearing which 
indeed operated on the MC events, and (ii) it 
reflects the correct truth-level spectrum, be- 
cause T = T, by construction. In real anal- 
yses, M. is not known to be ideal: (i) the as- 
sumed migrations model might not be realistic, 
which should be treated as a source of system- 
atic uncertainty, and (ii) the MC events used to 
compute M. may not follow T. Most notably, 



T ^ T when an exotic process generates un- 
expected events. This scenario is examined in 
Sec. 6.6.2. 

5 Living in N t dimensions 

A technical difficulty with FBU is that vr(T), 
L(D|T), andp(T|D), are defined in Nt dimen- 
sions. Defining vr(T) in Nt dimensions is not 
difficult, if limited to simple priors. On the 
other hand, sampling L(D|T) • vr(T) in many 
dimensions is challenging, because it usually is 
zero at most T-points, and it is difficult to lo- 
cate the region where it is non-zero. However, 
significant progress has been possible through 
the development of Markov Chain Monte Carlo 
(MCMC) sampling algorithms. 

5.1 Defining priors 

The simplest vr(T) is a constant ("flat") prior. 
In the case of Poisson-distributed data, Tt has 
to be positive in all t bins, therefore the sim- 
plest vr(T) is a A^-dimensional "step" function: 

, (T) Ji ifr,>ov te[ i,Ay 

I otherwise. 

For reasons related to sampling (Sec. 5.2), 
it is unpractical to allow non-zero 7r(T) in 
the whole, or even half of the iVj-dimensional 
space. Only a region of finite volume should 
be allowed, by setting 7r(T) = outside of it. 
This leads to a "box" prior, which is constant 
within some finite ^-dimensional rectangle, or 
hyper-box, extending from to in dimen- 
sion t: 

»(T) J 1 «r« 6 pf,Jflv* 6 [i,«a (16) 

I otherwise. 

As long as T[ > for all t, the upper edge 
T t can be arbitrarily large, so, practically the 
allowed hyper-box can be large enough to be 
confident that it contains T and effectively all 
of p(T|D), except for negligible tails that con- 
verge exponentially to 0. 
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With the prior of Eq. 16, nothing is pre- 
sumed about T, except being within a finite 
volume. To impose some regularization choice 
through S(T) and a (see Sec. 3), one simply 
uses a non-constant, finite- volume prior: 

r e ^(T) tt Tt €[2l,T?\Vte[l,N t ] 

7T(T) OC < 

10 otherwise. 

(17) 

5.2 Sampling 

Three sampling strategies are implemented to 
study FBU: 

Grid sampling: Evaluating L(D|T) -7r(T) at 
no equally spaced positions along each 
dimension. This results in Uq 1 samples, 
at the nodes of a regular cartesian grid. 
The rapidly increasing number of sam- 
ples is the reason no examples of this 
sampling method are presented, as this 
is only practical in cases of Nt < 4. 

Uniform sampling: Sampling L(D|T)-7r(T) 
at uniformly distributed random points 
in T-space. 

MCMC: Using a variation of the Metropolis- 
Hastings algorithm. The first sample of 
L(D|T) • tt(T) is taken at T = T. The 
next sample position is proposed ran- 
domly, following a uniform distribution 
within a hyper-box centered at the latest 
sampled position. Typically, the hyper- 
box has edge length Tt 10 ^* along dimen- 
sion t, though this may be adjusted dif- 
ferently if necessary. If the proposed 
point has greater L(D|T) -vr(T) than the 
latest sample, it is accepted, and it is reg- 
istered as the next sample. Otherwise, 
the ratio is found between the L(D|T) • 
7r(T) at the proposed point and the latest 
sample, and it is used as the probability 
to accept the proposed T-point. The re- 
sult of this algorithm is a random walk 



which drifts towards the most likely re- 
gion in the T-space, and samples it. Im- 
provements, such as MCMC with adap- 
tive step size, are possible. However, this 
basic implementation suffices for the pur- 
poses of this study. 

5.3 Marginalizing 

Each sample is a value of L(D|T) -7r(T), which 
is proportional top(T|D), at a given point T in 
the TVt-dimensional T-space. The set of sam- 
ples contains information about the shape of 
p(T|D) in T, which is the essential output of 
FBU, but is difficult to visualize. One typically 
wants the unfolded spectrum (see Sec. 1.1). 

To produce the unfolded spectrum, one 
must first compute from p(T|D) a set of 1- 
dimensional marginal posteriors, pt(Tf\T>), for 
t £ [1, . . . ,Nt\. The shortest interval in T% 
that integrates 68% of pt(Tt\D), denoted by 
[U t ,U t ], is shown as error bars in bin t of the 
unfolded spectrum. 

There are several options to define the bin 
content, U%\ 

i) Ut be the most likely value of T t . It may 
take too many samples to accurately es- 
timate the mode of pt(T t \D) when Nt is 
large. 

ii) Ut be the expectation value of Tt. 

iii) Ut be in the middle of [U t ,U t ]. Unlike i 
and ii, this does not require handling any 
asymmetric error bars. 

iv) Ut be the mean of a Gaussian (or other 
function) fitted to pt(T t \T)), and [U[ ,U^\ 
reflect the mean ± standard deviation 
of this Gaussian. The assumption of a 
parametrization for pt{Tt\D) is problem- 
atic, especially if vr(T) is not constant, 
and fitting is a source of possible compli- 
cations. 

Option iii is used, for its practical advantage, 
since Ut itself is not as important as [U t ,U t ]. 
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Ut probably should not be used in any com- 
putation by itself, whereas [U t ,U t ] could be 
compared to some theoretical Tt, to see if it is 
favored. One should remember, after all, that 
in FBU the real answer is p(T|D), and the un- 
folded spectrum is a mere device to visualize 
that. 

The rest of this section discusses how sam- 
ples are used to compute marginal posteriors 
from p(T|D). 

The marginal probability density in dimen- 
sion t is 

Pt (T t \D) = jj p(T\D)&T x . . . &T t -im+i ■ ■ ■ dT Nt 

(18) 

From this, one defines a finely binned prob- 
ability distribution, with a parameter 5 con- 
trolling the bin size along Tj. The probability 
integrated in [Tt — 5,Tt + 5] is 4 

P t (T t |D)= [ Tt+S p t (T/|D)dT/. (19) 
JT t -5 

Let there be N s samples, with sam- 
ple i 6 [Ij-^s] taken at point Tj = 
(Tj 5 i, Ti t 2, ■ ■ ■ , Ti t N t )- The value of this sam- 
ple is Wi = L(D|Tj) • vr(Ti) oc p(Ti|D). The 
boolean 



B{i,t,T t ) = 



1 ,if T it t G [Tt-S,T t + 5), 
, otherwise 



(20) 

allows to sum only the samples (i) for which 
the t th element of Tj belongs in the bin which 
contains the value T t : 

N s 

W(t,T t ) = ^2wiB(i,t,Tt). (21) 
i=i 

If the N s samples are uniformly distributed 
in the T-space, then 

ton WhQ = Pt (T t \I>) i (22) 
7V s ->-oo const 



4 Note the different notation: p t is used for the marginal 
bility in fine bins of T t . 



where const is a normalization constant inde- 
pendent of Tt. 

One needs to be careful with MCMC sam- 
pling (Sec. 5.2), because it does not sam- 
ple the T-space uniformly. This invalidates 
Eq. 22. To make it clear, consider that the 
MCMC random walk has reached an equilib- 
rium where the samples are distributed accord- 
ing to p(T|D). Then, W(t,T t ) would con- 
verge towards P t {T t \T)) 2 , and B(i, t, T t ) 
would converge towards P((Ti|D) instead. For 
this reason, only when the MCMC sampling 
method is used, Wi is set to 1 for all samples. 
However, even this is not enough to ensure that 
Pt(Tt |D) is computed correctly, because there 
is no guarantee that the MCMC has reached 
equilibrium. For this reason, MCMC is used in 
combination with uniform sampling, for which 
Eq. 22 holds, in the way explained in Sec. 5.4. 

Finally, 2-dimensional marginal posteri- 
ors are obtained in a similar way, showing 
Pt 1: t 2 (Tt 1 , Tt 2 \D), to visualize the correlation 
between the contents of truth-level bins t± and 

5.4 Volume reduction 

In Sec. 5.3 it is explained that MCMC is not as 
trustworthy for marginalizing p(T|D) as uni- 
form sampling. The latter, however, is not as 
efficient when the sampled hyper-box is very 
large(Sec. 5.2). 

For this reason, an optional procedure is 
used to reduce the volume of the a-priori 
allowed hyper-box (Sec. 5.1). The initial 
hyper-box can be large enough to be confi- 
dent that it contains T. In such a large vol- 
ume, the MCMC random walk travels towards 
the region of large p(T|D), and samples it. 
Soon MCMC approaches the equilibrium men- 
tioned in Sec. 5.3, and the marginal posteriors 
PtiTt |D) are determined with adequate accu- 
racy for the purpose of the next step: In each 

probability density, and P t for the total marginal proba- 
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dimension t S [1, Nt], the shortest interval con- 
taining 99% of P t (T t \D) is found. Then, this 
interval is used to redefine the boundaries of 
the allowed hyper-box in each dimension. 

When the allowed hyper-box is reduced, 
and located where L(D|T) • tt(T) is large, it 
is easier to use a uniform random sampling. 

Note that volume reduction is optional, 
and its motivation is merely practical. 

Also, in practice one can immediately see 
if the reduced hyper-box is too narrow in 
some dimension, because the corresponding 
marginal posterior will drop to abruptly at 
the boundary of the hyper-box. It is obvi- 
ously better to not allow the boundaries of the 
hyper-box to interfere with the posterior, espe- 
cially if the chopping is drastic. Fortunately, 
this is easy to detect and avoid in practice, 
when the posterior needs to be very precise 
even in the tails. 

6 Applications without regu- 
lar izat ion 

Various examples of FBU follow, applying the 
devices described above, to understand the be- 
havior of FBU, and hopefully some more gen- 
eral characteristics of unfolding. No regular- 
ization is used yet; that will be the subject of 
Sec. 7. 

Some of the aspects to investigate are: 

1. High vs low statistics data. 

2. Low dimensionality (Nt) vs large. 

3. Heavy smearing vs little (or no) smear- 
ing. 

4. Having N r = N t vs N r ^ N t . 

5. Sampling strategies and how they affect 
convergence. 

6. Building Ai with MC that follows the ac- 
tual truth spectrum (T = T), vs allow- 
ing some unexpected features (T / T). 



6.1 No smearing, 2 bins, high statis- 
tics 

The simplest example involves just two reco- 
level and truth- level bins (N r = N = 2). 
Events are generated in the first 2 bins defined 
in Eq. 12: {[mo, mi), [mi, m^)}. To have no 
smearing, a and b in Eq. 14 are set to 0. The 
result is the 2x2 diagonal migrations matrix 
with elements 

P(t = l, r = l) = _^_ = 0.66, 

P(t = 2,r = 2) = 0.34, ^ 
P(1,2) = P(2,1) = 0, 

which has efficiency ei = e% = 1. Fig. 3 shows 
the input data, and the initial sampled region, 
which is reduced (see Sec. 5.4) into the one 
shown in Fig. 4. 

The full result of FBU, p(T|D), is easy to 
visualize when Nt = 2, as in Fig. 5. Since no 
smearing is assumed, the reco-level spectrum 
is equal to the truth-level. The data are obvi- 
ously somewhat different, due to Poisson fluc- 
tuations. As a result of assuming no migra- 
tions, (i) the posterior probability distribution 
peaks around the observed data, and (ii) T\ 
and T2 are uncorrelated. In the bottom inset 
of Fig. 4 it becomes clear that the unfolded 
spectrum differs from the actual truth-level 
spectrum as much as the data spectrum differs 
from the truth-level (and reco-level) spectrum 
(Fig. 3). 

6.2 No smearing, 2 bins, low statis- 
tics 

Keeping all the settings of the example in 
Sec. 6.1, with only one difference: The MC 
events are generated with 1000 times smaller 
weight, resulting in the data spectrum of lower 
statistics in Fig. 6. 

The unfolded spectrum, and the volume- 
reduced sampled region, are shown in Fig. 7. 
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It remains true, as in Sec. 6.1, that the un- 
folded spectrum differs from the actual truth- 
level spectrum by as much as the data differ 
from the reco-level (and truth-level) spectrum. 

The full p(T|D) is in Fig. 8. Unlike 
Fig. 5, where the p(T|D) is a nearly perfect 
2-dimensional Gaussian, in Fig. 8, due to low 
data statistics, the Poisson asymmetric shape 
is visible in both dimensions. The maximum 
likelihood, as in Sec. 6.1, remains at the point 
D = (D\, D2), but here the difference between 
the expectation value E(T) = {E{Ti), E{T 2 )) 
and the most likely T is clear due to the asym- 
metry of p(T|D). 

6.3 More smearing, in 2 bins 

To show the effect of smearing, various de- 
grees of smearing will be applied, and the cor- 
responding migrations matrices will be used to 
perform FBU in a spectrum with Nf = N r = 2. 

In Eq. 14, the parameter a is kept at 0, 
and b is given the values {0.1, 0.3, 0.5, 0.8}, re- 
sulting to the migrations matrices in Fig. 9, 
the input spectra and the inferred p(T|D) in 
Fig. 10. 

The correlation between T\ and T2 stays 
the same, but the spread of the posterior in- 
creases quickly with smearing, and unlike the 
case without smearing, the spread is much 
greater than the statistical uncertainty of the 
data. This is a direct demonstration that, un- 
less some regularizing presumption is imposed 
through the prior (see Sec. 3), unfolding can 
not provide a precise answer. This is true not 
only for FBU, but for all unfolding methods; 
if the likelihood L(D|T) is so widely spread, 
there is no method that can recover the infor- 
mation lost with smearing, unless some exter- 
nal information is utilized, in the form of prior 
assumptions about the answer. 

Imprecise as the answer may be, it is at 
least accurate, in the sense that the correct 
T lies well within the unfolded spectrum er- 
ror bars, shown with the blue dashed lines in 



Fig. 10. This happens under extreme smear- 
ing, because the error bars are much larger. 
This can be untrue, though, in situations of 
very little (or no) smearing, as in Fig. 5 or 
Fig. 8, where a merely statistical fluctuation 
of the data by more than 1 standard deviation 
in one of the Nt = 2 dimensions is enough to 
drag the bulk of p(T|D) equivalently far from 
the correct T, while p(T|D) does not spread 
enough to keep including the correct T within 
its 68% core. 

Regarding the shape of p(T|D), while for 
little (or no) smearing it resembles a Nf- 
dimensional Gaussian (assuming high event 
counts), this is not true under heavy smear- 
ing. This happens because Tt > 0, for all t G 
[l,iVf], so, when smearing increases, p(T|D) 
gets chopped. This starts happening at dif- 
ferent amounts of smearing in each dimen- 
sion. Fig. 11 shows this effect, through 1- 
dimensional marginal distributions. 

As a final remark, the same results are ob- 
tained by fixing the smearing parameter b to 0, 
and increasing a instead. The correlations be- 
tween (Tt 1 ,Tt 2 ) pairs stays the same. Of course, 
to attain the same amounts of smearing, a has 
to increase to about 10, due to the large \fm 
denominator in Eq. 14. 

6.4 N t = 14 dimensions 

A more realistic example, with Nt = N r = 14 
bins is produced. The smearing is as described 
in Sec. 4, i.e., assuming (a,b) = (0.5,0.1). The 
migrations matrix in Fig. 2, and the truth, 
reco, and data spectrum are in Fig. 1(b). 

The initial sampling hyper-box (Sec. 5.1) 
is quite large, to minimize its effect on the an- 
swer. Its limits, in dimension t, are: 

l$ = f t /(t + l),t€[l,N t ] 

Ti = fx ■ 2 (24) 

T^Tl^te [2,N t ] 

and, since no extra processes are assumed, 
f = f (see Sec. 1.1). Fig. 12(a) shows this 
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initial sampled hyper-box, which is then re- 
duced (see Sec. 5.4) to the hyper-box shown 
in Fig. 12(b). 

The sampling of the reduced hyper-box 
proceeds with 10 7 uniformly distributed sam- 
ples, which takes less than 30 seconds with an 
typical modern CPU core. The final unfolded 
spectrum would be practically identical even 
with 10 5 samples, but 10 7 are used for aes- 
thetic reasons. 

The 1-dimensional marginal distributions 
of p(T|D) are shown in Fig. 13, and the un- 
folded spectrum in Fig. 14. 

In 14 dimensions it is not possible to vi- 
sualize the whole p(T|D), but it is helpful to 
show some of its 91 2-dimensional marginal 
distributions in Fig. 15. The smearing is sig- 
nificant, which enhances the (anti)correlation 
of the pair (Tt,Tt+i). This (anti)correlation 
is weaker between (^,^+2), even weaker for 
(Tf,T i+ 3), etc. This happens because migra- 
tions are more rare between bins that are far- 
ther apart. 

6.4.1 MCMC sampling 

The advantage of MCMC is that fewer of its 
samples are taken at T points where L(D|T) ~ 
0, therefore the sampling is more efficient. For 
the reason explained in Sec. 5.2, it may cause 
bias in the marginalization of p(T|D). The 
goal of this example is to examine this bias in 
practice. 

First, the initial sampling hyper-box shown 
in Fig. 12(a) is used. It is not reduced to 
a smaller volume, exploiting the ability of 
MCMC to navigate through large spaces to- 
wards the region of interest. With 10 6 MCMC 
samples, the 1-dimensional marginal distribu- 
tions of Fig. 16 are obtained. 

Comparing Fig. 16 to 13, the statistical 
fluctuations are much smaller in the former, us- 
ing MCMC, even though the MCMC samples 
arc 10 6 instead of 10 7 result of more effi- 
cient sampling. The shape of the distributions 



in Fig. 16, though, seems to have unnatural 
anomalies, especially in truth-level bins with 
small event counts. 

The anomalies get worse if the sampling is 
limited to the reduced volume used in Sec. 6.4, 
and shown in Fig. 12(b). This has to do with 
the ability of the MCMC algorithm to reach 
equilibrium, which depends on the sampled 
volume and on the step size of the MCMC ran- 
dom walk (Sec. 5.2). Improvements are pos- 
sible, by adjusting the MCMC step size, but 
they are left for future study. 

The unfolded spectra are shown in Fig. 18, 
as they are found with and without reducing 
the sampled hyper-box volume. Qualitatively 
the results are similar, but not identical. More 
interestingly, they are similar (but not identi- 
cal) to the result using uniform random sam- 
pling in the reduced hyper-box, in Sec. 6.4, 
shown in Fig. 14. 

So, using MCMC instead of uniform ran- 
dom sampling has the advantage of speed and 
lower statistical fluctuations in the result, but 
it probably should not be used when emphasis 
is put on very detailed computations. Luck- 
ily, after volume reduction (Sec. 5.4), uniform 
sampling is not prohibitively slow, as shown in 
Sec. 6.4. When MCMC is used, to get quick 
results, it is recommended to inspect the 1- 
dimensional marginal distributions, and make 
sure they look reasonably smooth, especially 
in bins with large Ti values. If this is not the 
case, a different step size in MCMC may help. 

6.5 N r ^ N t 

The goal of this example is to demonstrate the 
possibility of applying FBU even when the N r 
reco-level bins do not correspond to the Nt 
truth-level bins. The computer program writ- 
ten to produce these FBU examples required 
no modification to tackle this extreme scenario. 

The truth-level distribution of MC events 
in m, with Nt = 14 bins, does not change, 
but they are reconstructed only in N r = 5 nar- 
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rower bins shown in Fig. 19(a). Clearly the mi- 
grations matrix (Fig. 19(b)) is not square, and 
its efficiency (Fig. 19(c)) is zero in truth-level 
bins where smearing is not capable of migrat- 
ing events into the reconstructed bins. 

The same initial sampled hyper-box is de- 
fined as in Eq. 24, which is reduced according 
to Sec. 5.4, with the result shown in Fig. 20. 
The volume reduction is not great, due to the 
wide spread of pt(Tt\T)) for some Tt for which 
the data provide no constraint. 

In Fig. 21, all 1-dimensional marginal dis- 
tributions of p(T|D) are shown. The Tt 
in bins t = {1,2,3,10,11,12,13,14} are un- 
determined, because the migrations matrix 
(Fig. 19(b)) does not relate these T t values with 
any of the reconstructed data. These are pre- 
cisely the bins for which et = (Fig. 19(c)). 
The resulting unfolded spectrum is in Fig. 22. 
In bins that are unconstrained by the data, 
the unfolded spectrum is a mere reflection of 
the flat prior, namely, of the arbitrary sampled 
hyper-box. So, the only thing known about 
these truth-level bins is the prior. This is what 
one would expect from Bayes' theorem when 
the data are unrelated to the parameter of in- 
terest. However, an inference is possible about 
T4 and T5 (Fig. 21), despite the lack of recon- 
structed bins corresponding directly to the 4 th 
and 5 th truth-level bin (Fig. 19(a)); they are 
constrained by the data only through possible 
migrations into the m bins where data exist. 

6.6 Unfolding a bump 

It is expectable that unfolding can make a 
known bump, such as the Z boson mass peak, 
sharper, as it is before smearing. In this case, 
it is known that the bump is generated, so, 
T = T, and Ai contains this bump informa- 
tion. What if there is an exotic process, un- 
known to the MC and to Ail Will unfolding 
then make the bump sharper, or will it conceal 
it? 

The scenario of unfolding an expected 



bump is investigated in Sec. 6.6.1, and unfold- 
ing an unknown bump in Sec. 6.6.2. 

As seen in the inset of Fig. 14(c), the pre- 
cision of unfolding deteriorates quickly in later 
bins. This happens partly because the Poisson 
distribution is wider for smaller mean values 
(see Sec. 6.1 and 6.2), and partly because of 
larger migrations, which magnify the impact 
of these statistical fluctuations (Sec. 6.3). To 
avoid such complications, and to focus on the 
bump, some changes are made in Sec. 6.6.1 and 
6.6.2: 

i) The spectrum is not steeply falling, but 
constant, with the addition of a bump. 

ii) The smearing is not taken from Eq. 14, 
but a constant a is used, independent of 
m. 

hi) Instead of the 14 unequal bins of Eq. 12, 
Nt = 30 bins are used, to have enough 
bins to describe the bump. The m bins 
span from 500 to 3500, in steps of 100. 

6.6.1 FBU with a known bump 

Initially, no smearing is assumed (a = 0), 
which results in the diagonal Ai in the first 
row of Fig 23. The truth-level spectrum con- 
tains a profound bump: a Gaussian of mean 
2000 and RMS 100. This bump is reflected 
in Ai. The reconstructed spectrum is identi- 
cal to the truth-level one, and that along with 
the observed data and the sampled hyper-box 
are shown in the middle first row of Fig. 23. 
FBU is performed, without any regularization. 
In the interest of speed, MCMC sampling is 
used, without volume reduction. The resulting 
1-dimensional distributions are very regularly 
shaped, and smooth, which suggests that this 
approximation is satisfactory (see Sec. 6.4.1). 

The unfolded spectrum is compared to the 
truth-level spectrum on the right of the first 
row of Fig. 23. The bottom inset shows the 
relative uncertainty of the unfolded spectrum, 
which is equal to the data fluctuations. The 
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unfolded spectrum is effectively identical to 
the data, and its relative uncertainty in bin 
t is about equal to y/Dl/Dt, which in the side- 
bands of the bump is about —k= ~ 4%, and 

VDOO 

in the bump, where statistics are greater, it is 
smaller. This result is consistent with Sec. 6.1. 

The same FBU procedure is repeated for 
gradually increasing smearing. The rows of 
Fig. 23 correspond to a = {0, 50, 75, 100, 150} 
respectively. The relative uncertainty of the 
unfolded spectrum ( ^* ^'J ) increases from 4% 

to roughly 9%, 20%, 40% and 60%, with wild 
fluctuations covarying with Ut- With a = 50 
and uncertainty ~9% (2nd row of Fig. 23), 
the bump is still visible, and it seems slightly 
sharper than the reco spectrum, though, the 
error bars of the unfolded spectrum are large 
enough to make it also consistent with being 
less sharp than the reco spectrum. So, it is not 
clear that in this example unfolding made the 
feature sharper. If it made it a little sharper, 
it also made the error bars big enough to can- 
cel this benefit. When a = 50, the smear- 
ing is not very strong, so, it may be thought 
that unfolding could demonstrate its benefits 
more clearly when smearing is stronger. Un- 
fortunately, with a > 75 (3rd, 4th, 5th row 
in Fig. 23), the bump is hardly discernible in 
the unfolded answer, because the error bars are 
about as large as the bump itself. 

So, it seems that, if FBU is making the 
bump in Fig. 23 more sharp, it is simulta- 
neously increasing the uncertainty of the un- 
folded spectrum so much that the feature is 
less obvious. 

To see if this behavior is different for 
sharper truth-level bumps, the previous Gaus- 
sian is replaced with one with mean 2050 and 
RMS 5, so, at truth-level it populates just one 
bin. The results are summarized in Fig. 24. 
Just as in Fig. 23, smearing with a > 75 is 
enough to make the error bars about as big 
as the feature itself, when no regularization is 
used. 



6.6.2 FBU with an unexpected bump 

A similar truth-level spectrum is generated to 
the one in Sec. 6.6.1. MC events are then 
smeared, to form the expected reconstructed 
spectrum, which is then allowed to fluctuate 
according to Poisson, to produce pseudo-data 
similar to those in Fig. 23 and 24. The dif- 
ference, in this section, is that the MC events, 
used to compute Ai, follow a flat truth level 
spectrum T, namely different from the actual 
truth spectrum T. 

Fig. 25 and 26 present the cases where the 
unexpected bump has width 100 and 5 respec- 
tively, analogously to Fig. 23 and 24 discussed 
in Sec. 6.6.1. The same amounts of smearing 
are shown in each row of these figures, namely 
a = {0,50,75,100,150}. 

From Fig. 25 and 26, it seems that the un- 
folded spectrum maintains traces of the bump. 
For a narrow bump, and a > 100, it seems 
that the unfolded spectrum is enhanced in bin 
t = 16, so, by eye at least, the feature is 
more distinguishable in the unfolded spectrum 
than in the data. However, when smearing is 
greater, the uncertainty of the unfolded spec- 
trum grows, making it impossible to see the 
feature. 

FBU does not hide this unexpected fea- 
ture, when performed without regularization, 
because Eq. 4 does not use directly the ele- 
ments of Ai, P(t,r), but instead the condi- 
tional probabilities P(r\t), which are indepen- 
dent of the population in each truth-level bin 
t. On the other hand, when smearing is signif- 
icant, and no regularization is used, the poste- 
rior p(T|D) can be so wide that the feature is 
obscured. 

Later, in Sec. 7.2, an example with a much 
larger unexpected bump on a steeply falling 
spectrum is given. There, it will be more visi- 
ble that the unfolded spectrum has a narrower 
bump than the reconstructed spectrum, so it 
undoes the effect of smearing in the bump, 
but for this to be visible the bump has to 
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be so larger that the posterior's spread (see 
Fig. 46(a), and the result with a = in any of 
Fig. 53, 52, 50, or 48). 



7 Regular izat ion 

It is demonstrated in Sec. 6 that, when vr(T) 
is constant within the sampled hyper-box, and 
smearing is significant, the posterior p(T|D) is 
too widely spread. So, the unfolded spectrum 
has large error bars. Only if smearing is zero 
these error bars reduce to the level of y/Tf 

From observation makes it clear that the 
posterior's information content is limited by (i) 
lack of data, and (ii) smearing. Just like lim- 
ited data are an insurmountable limitation, so 
is smearing. Information can not be recovered 
after smearing, just like it can not be made 
up if the data are limited. Unless information 
comes in from somewhere beyond the data and 
beyond what is known about migrations! This 
is possible by shaping 7r(T), or through para- 
metric estimation of T, i.e., by fitting a func- 
tion through D. 

FBU makes it easy to try various regular- 
ization choices, without changing the formu- 
lation of the problem. Choices are unlimited, 
but the following few are studied here: 

i) S(T) is the entropy [2], multiplied by —1 
for reasons of convention explained below: 

(25) 

ii) S(T) is the curvature (Eq. 39 in Ref [5]): 

N t -1 

S 2 (T)^ Yl (At+M-A^) 2 , (26) 



t=2 



where 



iii) S(T) is a function that sums up the rela- 
tive variations of the first derivative, tak- 
ing into account the possibly varying bin 
sizes: 
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is the width of bin t, and 

Ct = -jimt + mt-i) 
is the center of bin t. 



(28) 

(29) 
(30) 

(31) 



iv) The prior is proportional to a multi- 
variate Gaussian (ttg), without correla- 
tions, which disfavors T-points far from 
f (Sec. 1.1): 



N t 

vr G (T) = n 



exp 



t=i 



{Tt-f t f 
2{f t /aY ' 



(32) 



Ati,t 2 = ^ti — Ti 



t2- 



(27) 



In this case, the parameter a adjusts the 
RMS of the Gaussian, which is set to T t /a 
in dimension t. 

In all the above cases, except for the Gaus- 
sian prior (iv), within the sampled hyper-box 
the prior is given by Eq. 10. 

When a = 0, no regularization applies. 
The larger the a, the stronger the prior belief 
that 5(T) must be small (or, in case iv, the 
stricter the Gaussian constraint). It is infor- 
mative to try various values. The result may 
not be satisfactory, if the bias introduced is 
unacceptable and the uncertainty reduction is 
small. An exception is the choice iv, where, if 
T happens to be the correct truth-level spec- 
trum (T), then larger a values reduce both the 
posterior uncertainty and bias; the posterior is 
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concentrated closer to T. Of course, it is dif- 
ficult to trust that T = T, because that pre- 
sumes a perfect MC simulation, and absence 
of exotic processes. 

In the interest of speed, the sampling 
method in the following regularization ex- 
amples is MCMC, without volume reduc- 
tion. The 1-dimensional marginal distribu- 
tions of p(T|D) are smooth, indicating that 
this approximation is satisfactory, as shown in 
Sec. 6.4.1. 

7.1 Steeply falling spectrum 

Sections 7.1.1 and 7.1.2 assume a steeply 
falling spectrum with A^=14 bins, as in Sec. 4. 
In Sec. 7.1.1 no smearing is assumed, and 
in 7.1.2 smearing is applied with parameters 
(a, b) = (0.5,0.1), as in Sec. 4. Multiple regu- 
larization attempts will be made in each case, 
in an attempt to build intuition. 

7.1.1 Without smearing 

The spectrum used in this section is in Fig. 27, 
where no smearing is assumed. 

The first attempt is to use Si (Sec. 7). 
The parameter a is varied, from to 3 x 10 3 . 
Fig. 28 demonstrates that increasing a re- 
sults in a p(T|D) that favors T points with 
smaller Si(T). The resulting unfolded spectra 
are shown in Fig. 29. The posterior becomes 
highly biased with respect to the actual truth- 
level spectrum (T), and the reduction in its 
spread is small. 5 It seems that, when there 
is no smearing to enhance the spread of the 
posterior, there is not much uncertainty for 
regularization to reduce. In Sec. 7.1.2, where 
smearing is activated, the reduction in poste- 
rior spread is significant. 

The next attempt is with 52- The poste- 
rior probability distribution in S% is in Fig. 30, 
and the unfolding results in Fig. 31. This reg- 
ularization only affects the first bins, and, as 

In some bins, the uncertainty even increases, which 



expected, increases their bias and slightly re- 
duces their uncertainty. 

The attempt with S3 is similarly shown in 
Fig. 32 and 33. The posterior spread is signif- 
icantly reduced in the last few truth bins, at 
the cost of significant bias. 

In Fig. 34 are the results when regulariza- 
tion is made through a Gaussian constraint, 
of varying RMS, controlled by a as explained 
in Sec. 7. The first row in Fig. 34 confirms 
simply that the prior constrains the unfolded 
spectrum near the truth-level spectrum known 
from the MC (T) which is used to construct the 
migrations matrix. The posterior's spread can 
be constrained arbitrarily by increasing a, and 
in this idealized case there is no bias cost, be- 
cause the MC truth-level spectrum is, by con- 
struction, the correct one (T = T). 

Fig. 35 and 36 show some of the 1- 
dimensional and 2-dimensional marginal distri- 
butions of p(T|D), for some of the regulariza- 
tion options tried in this section. Two effects 
are notable: (i) Even though there is no smear- 
ing, regularization can cause correlations, (ii) 
For some regularization functions and parame- 
ters (e.g. (S, a) = (S3, 20)), it is possible to in- 
duce secondary maxima in the posterior, some- 
thing not observed without regularization. 

7.1.2 With smearing 

The spectrum used in this section is shown is 
the same one used in previous sections, and 
is shown in Fig. 12(a). The initially sampled 
hyper-box is the same shown there, and de- 
fined in Eq. 24. MCMC is used for sampling, 
without volume reduction. 

Like in Sec. 7.1.1, the regularization with 
negative entropy (Si) is tried first. The results 
are shown in Fig. 37 and 38. The conclusions 
are similar to those in Sec. 7.1.1, with the dif- 
ference that he posterior spread is large prior 
to regularization, due to smearing, and it is 
greatly reduced by regularization. 

could be due to statistical fluctuations in sampling. 
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The results of using 52 are shown in 
Fig. 39 and Fig. 40. Comparing Fig. 40(d) 
to Fig. 31(d) shows again that smearing in- 
creases the spread of the posterior. That al- 
lows regularization, in the presence of smear- 
ing (Fig. 40), to reduce significantly the pos- 
terior spread to levels almost as low as those 
in Sec. 7.1.1 without smearing, at the cost of 
significant bias. 

The results with S3 are shown in Fig. 41 
and Fig. 42. It is clear that the tendency 
is to make the unfolded spectrum more con- 
stant. An interesting effect is observed in 
Fig. 41(c), where the regularization condition 
is stronger (a = 40). The posterior p(T|D) 
favors two (at least) categories of T-points, 
some with log^S^T) — 5.5 and some with 
log 10 S , 3(T) ~ 4.5. This is suggestive that 
p(T|D) has secondary local maxima in its 1- 
dimensional marginal distributions, which is 
indeed confirmed in Fig. 43. The secondary 
maximum corresponds to T spectra that have 
plateaus in different groups of bins, as shown 
in Fig. 44. 

The results with Gaussian regularization 
are shown in Fig. 45, with similar behavior as 
in Sec. 7.1.1. 

It should not be surprising if, with regu- 
larization, the unfolded spectrum has smaller 
uncertainty than the data statistical uncer- 
tainty; external information can cause this. It 
is obvious, at least in the Gaussian regular- 
ization, that the spread of the posterior can 
become arbitrarily small, regardless of avail- 
able statistics. The situation is analogous to 
fitting a straight line through many points, 
some of which have small statistical uncer- 
tainty, and few of which have large uncer- 
tainty. The straight line is defined by two pa- 
rameters, which are mostly constrained by the 
points with small uncertainty. Even at posi- 
tions where the data have great uncertainty, 
the fitted function will have small uncertainty, 
thanks to the external information that the an- 
swer must be a straight line. 



7.2 Spectrum with an unexpected 
bump 

In this section, regularization is tested in the 
presence of an unknown, smeared bump. 

MC events are generated following a 
steeply falling truth-level spectrum with Nf = 
28 bins, with a prominent Gaussian bump of 
mean m = 1500 and RMS=50. Smearing with 
parameters (a,b) = (0.5,0.1) applies. The mi- 
grations matrix is built with MC events where 
the bump is missing, thus, the bump is an un- 
known feature. Fig. 46(a) shows the truth- 
level spectrum with and without the bump, 
the reconstructed spectrum without the bump, 
which consists of the MC events that populate 
the migrations matrix (Fig. 46(b)), and the 
data. Also shown is the sampled hyper-box, 
which is bigger than in Eq. 24 to accommo- 
date the actual truth-level spectrum (with the 
bump) T, and is defined by: 

r t r = f t /(t + i),te[i,iv t ] 

Tt=ff(t + 2),t€[l,N t ] 

where T is the truth level spectrum without 
the bump, used to populate the migrations ma- 
trix. 

Unfolding with negative entropy regular- 
ization (Si) is shown in Fig. 47 and 48. As seen 
previously (see Sec. 7.1.2), regularization with 
entropy seems to distort the unfolded spectrum 
significantly. 

Using the regularization function S2 is 
shown in Fig. 49 and 50. The spread of the 
posterior is greatly reduced, especially in the 
first truth- level bins. The unfolded spectrum 
seems to have a bump, but it is significantly 
wider than it is at truth-level; it has width 
similar to the data, after smearing. 

Using the regularization function S3 is 
shown in Fig. 51 and 52. The unfolded spec- 
trum tends to be flat in long intervals, thus 
obscuring the shape of the bump. 

Finally, FBU with a Gaussian constraint 
is shown in Fig. 53. As expected, a stronger 
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constraint leads to a narrow posterior which 
concentrates around T instead of T. Mild val- 
ues of a, such as a = 1, don't tend to make 
the bump in the unfolded spectrum as narrow 
as it is at truth-level; the bump maintains its 
reco-level width. 

None of the regularization methods tried 
seems to improve how the bump appears in 
the unfolded spectrum. Its shape is either ob- 
scured, or it remains as wide as it is in the 
data, after smearing. When having no regu- 
larization, nevertheless, the unfolded spectrum 
peaks in a narrow region, similar to how nar- 
row the bump is in T (see examples with a = 
in any of Fig. 53, 52, 50, or 48). The problem is 
that the posterior is spread wide without reg- 
ularization, resulting in large error bars in the 
unfolded spectrum. 

7.3 Spectrum with an expected 
bump 

In this section, regularization is tested in the 
presence of an known, smeared bump. 

The data contain the same feature as in 
Sec. 7.2, except that here the MC contains the 
same feature at truth- level (T = T), so, the 
migrations matrix is aware of it. The input 
spectra, the migrations matrix, and the sam- 
pled region are shown in Fig. 54. The sampled 
region is defined, as before, by Eq. 33. 

Unfolding with Si is shown in Fig. 55 and 
56. The results are not much different from 
Sec. 7.2, Fig. 48, where the bump was ex- 
pected. Entropy demands the unfolded his- 
togram to be closer to horizontal, and the 
truth-level width of the bump is not resolved 
with this regularized unfolding. It seems bet- 
ter resolved without regularization (a = 0), 
but the spread of the posterior is then larger. 

Using the regularization function S2 is 
shown in Fig. 57 and 58. The results are sim- 
ilar to Sec. 7.2, Fig. 50, where the bump was 
expected. 

Using the regularization function S3 is 



shown in Fig. 59 and 60. The results are simi- 
lar to Sec. 7.2, Fig. 52. 

Finally, FBU with a Gaussian constraint is 
shown in Fig. 61. As expected, a stronger con- 
straint leads to a narrow posterior which con- 
centrates around T, which is by construction 
the same as T, which means that the poste- 
rior narrows down to T with increasing values 
of a. This, unfortunately, is too ideal to be 
realistic. If one knows already T, then there 
is no need for unfolding, or even for any data. 
In reality T will not be known to be exactly 
equal to T, even for expected bumps (e.g. the 
Z boson mass peak). 

None of the regularization methods tried 
seems to improve how the bump appears in 
the unfolded spectrum, even when the bump 
is known at the MC level. Only exception is 
the Gaussian constraint, which, as discussed, 
is not a realistic scenario; how close T is to T 
would have to be considered as a systematic 
uncertainty. 

From the examples in Sec. 7.2 and 7.3, it 
seems that regularization indeed reduces the 
spread of the posterior, but it increases the 
bias significantly, distorting a feature such as 
a bump. The only regularization condition 
that could help resolve a feature more finely 
than it appears after smearing, is with an S(T) 
tailored to the actual truth-level spectrum T 
(e.g. a Gaussian constraint towards T, which 
is necessarily assumed to be ~ T). If an un- 
expected feature is present, this is not possi- 
ble (since T / T), so, it is difficult to ensure 
that the regularization condition will reflect an 
actual property of T. If it doesn't reflect an 
actual property of f (e.g., if S(T) = Si(T) 
in the presence of a bump), then the feature 
may be very distorted by regularization, even 
if the feature was expected. It would, maybe, 
be preferable to apply no regularization, i.e., 
to assume a constant prior, in which case the 
feature will not be distorted, and the truth- 
level (narrow) bump will be estimated with- 
out bias, even if the bump is unexpected. But 
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the spread of the posterior will be so big, that 
probably the feature will not be any more clear 
than before unfolding. 

8 Interpretation of unfolding 

It is a common misconception that the result 
of unfolding is "corrected data", which can be 
used to set limits and test hypotheses as if it 
was data, provided only that some attention is 
paid to the covariance between bins. SVD and 
iterative unfolding, which provide an estimator 
(i.e., a single T) and a covariance matrix, make 
this misconception easier, when Nf = N r , be- 
cause the user inputs a histogram D and gets 
another similar histogram as output. 

The origin of this misconception is that the 
data (D) are often thought to have uncertainty 
in each bin, which is uncorrelated between in- 
dependent bins. From this one can easily be 
led to think that unfolding may introduce bin 
correlations but, apart from that complication, 
its answer can be used "corrected" replace- 
ment of the original D. 

The impression that data (D) have uncer- 
tainty is wrong to start with. When one ob- 
serves D r events in the r th reco-level bin, there 
is no statistical uncertainty about how many 
events were counted, and, assuming we know 
how to count correctly, there is no systematic 
uncertainty either. It is customary to plot D 
with error bars equal to \/D r in bin r, and 
this is what makes people think that D has an 
uncertainty. Variance, however, is only a prop- 
erty of probability distributions; not of actual 
observations. 

How it became customary to draw error 
bars of \flT r around D is a question for the 
historian of science, but the interpretation of 
these error bars should be the following. When 
we observe D r independent random events in 
bin r, this number is assumed to be pulled from 

6 It is understood that R r 7^ R r , but only E(R r ) — R, t 
average, in a frequentist sense. 



a Poisson distribution with mean R r . One 
doesn't know R r , but can try to infer it. To 
do so one can classically construct the max- 
imum likelihood estimator (MLE), R r , which 
maximizes the likelihood 

L(D r \R r ) = ^e"^. (34) 

Maximizing this results in R r = D r . This MLE 
is an unbiased estimator of R r , because its ex- 
pectation value is 

E(R r ) = E(D r ) = R r . (35) 

The standard deviation of a maximum likeli- 
hood estimator is estimated as explained in [6] 
(Fig. 6.4), and it reflects the width ot L(D r \R r ) 
near its maximum. This is totally equivalent 
to solving Bayes' equation for a constant prior 
ir(R r ): 

p(R r \D r ) oc L(D r \R r ). (36) 

As mentioned in Sec. 3, the classical MLE is 
nothing but the mode of the Bayesian poste- 
rior, if the prior is assumed constant, and the 
variance of the MLE reflects the width of this 
posterior. If D r is a large enough number, then 
the above p(R r \D r ) is approximated well by a 
Gaussian of mean D r and standard deviation 
\/D r . Similarly, the MLE of R r is R r = D r , 
which suggests 6 that, if indeed R r = D r , then 
if the experiment was repeated infinite times 
these hypothetical data would be distributed 
like a Gaussian of mean D r and standard de- 
viation \J D r . So, when the data D r are drawn 
with error bars of size y/D r it should be under- 
stood that these error bars don't refer to the 
standard deviation of D r , since such a thing 
is not defined, but they refer to the standard 
deviation of p(R r \D r ), or of the distribution 
suggested by the estimator R r . The mode of 
p{R r \D r ) happens to be numerically equal to 
D r , so the error bars are centered at D r , but 

. So, loosely speaking, this suggestion would be valid on 
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that's about all they have to do with D r it- 
self. They reflect an inference about R r . If one 
wants to test how compatible D r is with a the- 
oretical hypothesis which predicts an expected 
number of events R r , then the customary error 
bar of \fDr is irrelevant. The actual likelihood 
of D r , under the hypothesis of R r , is given by 



nD r 

U r -R r 

DJ 



(p r -R r y 

g 2R r 



(37) 



and not by 



2D r 



(38) 



The correct way to visualize this comparison 
would be to plot D r without any error bars, 
and then superimpose the value R r surrounded 
by error bars of size yRr, if R r is large enough 
to justify the Gaussian approximation. More 
on this in [7, 8]. 

So, the data (D) never had uncertainty to 
begin with, and it is misleading to think that 
the result of unfolding is just "corrected data" 
with bin correlations. 

8.1 Using p(T|D) to estimate param- 
eters at truth-level 

To set limits on the expected number of recon- 
structed events of a hypothetical signal, s, one 
needs to compute 



p(s\D) oc L(D\s) ■ tt(s). 



(39) 



This is very simple to do at reco-level, if it is 
known how the signal is reconstructed, namely, 
if model of the detector response is available. 
Details can be found in Ref. [11]. 

A reason unfolding is used in HEP is to 
make it easy for theorists to test their theories 
without having a model of the detector. Here 
it will be examined how this can be done with 
FBU, and under what conditions it is correct. 



Let's assume that the Standard Model 
(SM), at truth- level, at the luminosity corre- 
sponding to the analyzed data, predicts a spec- 
trum T^^, and the assumed new physics (NP) 
adds on top of that a spectrum T NP . For sim- 
plicity, possible destructive interference is ig- 
nored. For example, T SM could be a steeply 
falling spectrum, and T^^ be a bump. Let's 
define as parameter of interest the signal cross- 
section in units of pb, denoted by a. To make 
a explicitly appear in the equations, it is con- 
venient to write T NP as a-T SNP , where T SNP 
has the same shape as T^^, but is scaled 7 
to the integrated luminosity that corresponds 
to 1 pb -1 . Let's further assume that the re- 
sponse matrix for SM is J> SM and for NP it is 
V NP . The elements of a response matrix (see 
Sec. 1.1) reflect both the probability of being 
smeared from one bin into another, and the 
probability to be reconstructed in any of the 
considered reco-level bins. 

To infer the a of this hypothetical signal 
one needs to compute 



p(cr|D A V SM A V Nt ' A T SJVjK A T 



NP 



^SNP 



(40) 



For brevity, let's denote with K the condition 
V SM A V NP A T SNP A T SM_ Xhen; 

p(a\B AK) (x L(D\a AK) ■ Tr(a AK). (41) 

Assuming that all components of K are not un- 
certain, the joint prior ir(crAK) can be written 
as a product of 7r(o") and ^-functions that pin- 
point each component of K to its known value. 
For brevity of notation, we can omit K, just 
like A4 was omitted from Eq. 1. It is, of course, 
remembered that K is silently assumed. 



p(a|D) oc L(D\a) •vr(fj). 



(42) 



How can this be computed using the out- 
put of FBU? The information given to the the- 
orist is p(T|D), from Eq. 2. From this, and the 
stated regularization (ir(T)), the theorist can 
extract 

p(T|D) 



L(D T) oc 



7T(T) 



(43) 



7 SNP stands here for "scaled new physics" 
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Assuming K and a value of <7, the expected 
reconstructed events are, in analogy to Eq. 8, 
given by 

K = B+(V SM ) T T SM +a(V NP ) T T SNP . (44) 

The likelihood of D under this assumption 
(L(D\a), in abbreviated notation) can now be 
computed by the theorist, by evaluating Eq. 43 
at the T which corresponds to the same R as 
the hypothesized NP. Namely, 

R = B + V T T 

= B + (pSM^T T SM + a ( V NP}T T SNP^ 

which gives 

+ a{V T )-\V NP ) T T SNP , (45) 

where V is the response matrix used in the 
provided unfolding. 

It is reasonable for a theorist to assume 
that V = V s M , since usually SM MC is used to 
construct V for the unfolding. As long as the 
theorist doesn't want to assume his own detec- 
tor model, (T T )- 1 (T SM ) T = (V T )- l V T = I, 
so, 

T = T SM + a{V T )- 1 {V NP ) T T SNP . (46) 

Regarding V NP , it is reasonable to assume 
that the signal and the SM are subject to 
the same detector energy resolution and other 
hardware sources of smearing which are repre- 
sented in V . However, depending on the char- 
acter of the signal, the efficiency of being re- 
constructed may not be the same as for SM. 
The difference could be an overall multiplica- 
tive factor, e.g., some branching ratio, but it 
could also vary from bin to bin. The theorist 
needs to check, or at least argue, whether the 
following is true 

P(r\t) = P NP {r\t) ,V(r,t). (47) 
8 For a detailed introduction in frequentist hypothesis 



To answer, he needs to have a MC generator 
and detector simulation for his NP, from which 
he can estimate P NP (r\t), and he needs to be 
supplied with the elements of the V used in 
the unfolding, to compare the elements of the 
two matrices. So, one needs to be aware that 
the plain answer from unfolding is not enough 
to be sure it is interpreted correctly. Unfor- 
tunately, some kind of detector model is still 
needed by the theorist to obtain V NP , and the 
experimentalists need to publish the elements 
of V. 

If one is convinced that V = V NP , then 
Eq. 45 is further simplified to 

T = T SM + aT SNP . (48) 

The procedure to arrive at p(<j\D) is to as- 
sume an array of a values, then for each a com- 
pute the T of Eq. 45 (or 46 or 48, if allowed), 
then insert this T into Eq. 43 to calculate 
L(D|T) up to a constant, and finally insert this 
L(D|T) into Eq. 42 to compute p(cr|D) up to a 
constant, assuming the prior 7r(cr). When this 
is done for all a values, the function p(cr|D) 
will not be normalized to 1 yet, so, this needs 
to be done eventually. 

8.2 Hypothesis testing 

In Sec. 8.1 it was shown how to compute 
the probability of any hypothetical NP cross- 
section a, using the output of FBU. Bayesian 
hypothesis comparison is then straight for- 
ward, and consists in comparing the posterior 
probabilities of two competing hypotheses, of 
which one could be just the SM hypothesis. 
More discussion in Ref. [10]. 

For a frequentist hypothesis test 8 at truth- 
level, it is possible to define the following null 
hypothesis: The given truth-level spectrum T 
is pulled from the posterior of FBU, p(T|D). 
This T could be the prediction of a theorist 
who is curious if it is consistent with the pos- 
terior about T. 

ing, see Ref. [9], and [10] for some fair criticism. 
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A test statistic can easily be defined using 
truth-level quantities: 

X 2 (T)^-logp(T|D). (49) 

The corresponding p-value is the probability 
that a T sampled from p(T|D) would be at 
least as unlikely, according to p(T|D), as T. 
Namely, 

p-value = P(x 2 (T) > X 2 (T)|T ~ p(T|D)), 

(50) 

where T ~ p(T|D) means "T is randomly 
sampled from p(T|D)". 

p-values are notoriously easy to misinter- 
pret. The following is an attempt to ex- 
plain how this p-value should be understood 
(adapted from [9]). 

First, let's imagine a robot, Rejectron, 
which can utter only this sentence that re- 
jects the null hypothesis: "THIS T IS NOT 
PULLED FROM p(T|D)". It says this me- 
chanically whenever a T with p-value < a is 
presented to it. A knob on its chest adjusts the 
value of a 6 [0, 1]. If infinite T's are presented 
to it, and they follow p(T|D), then the robot 
is guaranteed to make a false rejection of the 
null hypothesis with frequency a. 

Now, let's say that the specific T a theo- 
rist proposes has p-value = 7. If the robot's a 
is less than 7, then the robot will stay silent. 
If a = 7 then the robot will reject the null 
hypothesis. If a > 7, the robot will still re- 
ject the null hypothesis, but its false rejection 
probability in an infinite ensemble of spectra 
pulled from p(T|D) would be larger than 7. 
So, the minimum value one could set a to, and 
still reject the null hypothesis when T is pre- 
sented to the robot, is 7. So, the p-value of 
T is the minimum possible false-rejection fre- 
quency (i.e., Type-I error rate) of a robot (i.e., 
a decision algorithm) which rejects the null hy- 
pothesis when T is presented to it. 

So, is this p- value saying how likely T is to 
originate from p(T|D)? No, although this is 
a common misinterpretation of p-values. Is it 



the probability that T = T? Obviously not. 
This p-value says more about Rejectron than 
about T. 

If the theorist wants to know how likely his 
theory is, according to the result of FBU, he 
can define a volume in the vicinity of T that 
he considers representative of his theory, and 
integrate p(T|D) in that volume. An integra- 
tion is necessary, since individual T points are 
not assigned a probability, but a probability 
density. As explained in Sec. 8.1, this is only 
correct as long as his theory does not involve a 
different migration model (V NP ) from the mi- 
gration model used to derive the unfolding ("P). 
Interestingly, this should be the case when T 
is the SM, and the V used in unfolding was 
also derived from SM MC. So, it is straight 
forward for a theorist to compute the proba- 
bility that SM is true by integrating p(T|D) 
in a volume around T SM which represents the 
theoretical uncertainty in the SM prediction, 
i.e., uncertainty from higher order terms, par- 
ton distribution uncertainty, etc. 

9 Conclusion 

A fully bayesian unfolding (FBU) method is 
formulated, and presented in numerous exam- 
ples. 

To conclude, some observations made in 
Sec. 6 and 7 will be summarized concisely. 
Then, some final recommendations will be 
given in Sec. 9.2. 

9.1 Summary of observations 

• The asymmetric, non-Gaussian shape of 
p(T|D) is evident in truth-level bins with 
low statistics. 

• No smearing leads to no correlations, and 
p(T|D) is maximized at T = D. 

• Smearing increases the spread of p(T|D), 
and introduces correlations. 
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Large smearing, even when event counts 
are large, can cause p(T|D) to be non- 
Gaussian, due to the Tt > boundary. 

By reducing the volume of the sampled 
hyper-box, uniform sampling is fast even 
in FBU with large N t . 

MCMC is much more efficient, and qual- 
itatively it gives very similar results, but 
it is not as accurate, due to anomalies in 
the marginal distributions of p(T|D). 

It is possible to infer Tt in bins where 
there are no data, provided that migra- 
tions are possible from these bins into re- 
gions where data exist. 

Expected bumps can become more sharp 
with FBU, without regularization, pro- 
vided that the smearing is not big enough 
to make the uncertainties comparable to 
the bump itself. 

Regularization can affect correlations. 

Regularization can cause secondary max- 
ima in the posterior. 

In the presence of smearing, there is more 
potential for regularization to reduce the 
posterior's spread. 

Regularization doesn't help make an un- 
expected bump more sharp. It either ob- 
scures it, or it maintains the width it has 
in the data after smearing. 

Unfolding without regularization recon- 
structs unexpected bumps with the right 
width, but the posterior is too spread 
out, which may obscure the bump unless 
it is enormous. 

Regularization doesn't necessarily make 
an expected bump more sharp, unless 
7r(T) is tailored to T (e.g., with a Gaus- 
sian constraint), and T ~ T. 



9.2 Final Recommendations 

• An unfolded spectrum is not "corrected 
data" , and should not be thought of in 
such terms. 

• Unfolding is a non-parametric inference 
procedure, and it should be avoided un- 
less it, per se, is the end goal of the anal- 
ysis. 

• If the goal is to set a limit or estimate an 
unknown parameter, or to test a hypoth- 
esis, it is much simpler to use the actual 
data (D) for this. 

• If unfolding is used, all choices and de- 
tails must be presented transparently. It 
is not enough to say "an unfolding tech- 
nique was used." The regularization con- 
dition (i.e. the prior) needs to be pub- 
lished, as well as the elements of V. 

• A result with no regularization (constant 
prior) has special properties and should 
be shown by default, even if some regu- 
larization is used eventually. Alternative 
regularization choices are encouraged if 
believed to be reasonable. The prior, 
thus the regularization, is assumed sub- 
jective and informative, although special 
"non-informative" priors can be consid- 
ered as well. 

• Reporting the result of some unfolding 
procedure does not justify omitting the 
data (D). This is hard to emphasize 
enough. Unfolding does not replace the 
data. 
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Figure 1: (a): The distribution, in observable 
m, that MC events follow (Eq. 11). The dotted 
lines indicate the delimiters of m bins, (b): An 
example of generated MC events. The red solid 
line shows the distribution of 10 7 MC events, 
each of weight 10~ 3 , before smearing. The 
dashed black line shows their distribution af- 
ter smearing. The markers represent observed 
data, which are random numbers following a 
Poisson distribution of mean given by the reco- 
level spectrum. 
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Figure 2: (a) The migrations matrix, 
P(truth, reco), for smearing defined by Eq. 14 
with a = 0.5 and b = 0.1. (b) The matrix of 
P(r\t) for the same smearing, (c) The migra- 
tions matrix efficiency of Eq. 6 for the same 
smearing. The dotted lines indicate the bin 
delimiters in truth-level and reco-level m. 
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Figure 3: Input truth-level, reco-level, and 
data spectrum of the example in Sec. 6.1. Due 
to assuming no smearing, the truth and the 
reco spectra coincide. The green area shows 
the initial sampled hyper-box (Sec. 5.1). 
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Figure 5: For the example of Sec. 6.1 
the following are shown: The 2-dimensional 
Pi,2(T 1 ,T 2 \T)), in gray. The cyan dotted 
lines cross at ((Ti), (T2)), where (T t ) = 
J Ttp(T|D)dT, and their half-width is equal 
to the RMS of p t {T t \D) for t = {1,2}. The 
green doted box shows the limits of the sam- 
pled region. The red dashed lines cross at the 
correct values of (T\,T2). The black circle cor- 
responds to the observed data (D\,D2). The 
blue dashed lines and the empty blue square 
marker indicate the content of the unfolded 
spectrum. 



£ 

> 
<D 

"O 

aj 
o 
<u 
a. 
x 
LU 



10 4 



0.02 
50.00 
-0.02 



Sampled region 
-a- Unfolded 
Truth 



truth bin 



Figure 4: The unfolded spectrum, compared to 
the truth spectrum, in the example of Sec. 6.1. 
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Figure 6: Input spectra of the example in 
Sec. 6.2. 
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Figure 7: The unfolded spectrum, compared to 
the truth spectrum, in the example of Sec. 6.2. 
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Figure 8: Corresponding to Fig. 5, but for the 
example in Sec. 6.2. 
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Figure 9: The migrations matrix (left) and effi- 
ciency (right) resulting from setting the smear- 
ing parameter b to 0.1 (a), 0.3 (b), 0.5 (c), and 
0.8 (d), as in Sec. 6.3. 
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(c) (d) 

Figure 10: The inferred p(T|D) for the smearing parameter b to set to 0.1 (a), 0.3 (b), 0.5 (c), and 
0.8 (d), as in Sec. 6.3. The appearing markers and lines are explained in Fig. 5. 
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Figure 11: Marginal distributions Pi(Ti|D) (a), and P 2 {T 2 \U) (b), for b set to 0.1 (a), 0.3 (b), 0.5 
(c), and 0.8 (d), as in Sec. 6.3. 
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Figure 12: In (a): The truth, reco, and data spectrum of the example in Sec. 6.4. These are the 
same spectra shown in Fig. 1(b), except that here the initial sampling region is overlaid, defined 
by Eq. 24. In (b): The initial sampling region (called "old") is shown and compared to the "new" 
hyper-box, which is found by volume reduction, according to Sec. 5.4. Instead of showing the 
observable m in the horizontal axis, in (b) the horizontal axis shows simply the index of each bin. 
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Figure 13: The 1-dimensional marginal distributions of p(T|D) in the example of Sec. 6.4. The 
yellow distribution is P*(Tt |D). The red cross marker shows the actual truth spectrum content 
in each bin (Ti). The black circle marker shows the observed data in each bin (Dt). The blue 
dashed line and the blue square marker show the unfolded spectrum contents [U^ , U^] and Ut- The 
green dotted line shows the range in T-t that is included in the sampled hyper-box (after volume 
reduction). 
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Figure 14: The unfolded spectrum of the example in Sec. 6.4. The relative difference is plotted in 
the inset of (a), but due to the large error bars it is hard to see the difference in the first truth bins. 

In (b) , the difference Ut — Tt is divided by the error bars of the unfolded spectrum ^Unfold = * 2 * . 
Only bins 1 and 3 have T% outside of \U[ , U^]. In (c), the relative error bars of the unfolded spectrum 
is shown, ^ — = y r+ ^ . 
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Figure 16: The 1-dimensional marginal distributions of p(T|D) in the example of Sec. 6.4.1. 
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Figure 17: The 1-dimensional marginal distributions of p(T|D) in the example of Sec. 6.4.1, sam- 
pling the reduced ("new") volume in Fig. 12(b). More anomalies are observed than in Fig. 16. 
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Figure 18: The unfolded spectrum of the example in Sec. 6.4.1. In (a), (b) and (c) the result is 
obtained without reducing the volume of the initial sampled hyper-box, and in (d), (e) and (f) 
the initial hyper-box has been reduced. The unfolded spectrum doesn't change much with volume 
reduction, even though in Fig. 17 it is seen that this introduces some anomalies. 



33 
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Figure 19: In (a): The truth-level spectrum generated (red), the corresponding reco- level spectrum 
(black dashed), and the pseudo-data (black markers) for the example of Sec. 6.5. (b): The corre- 
sponding migrations matrix, of dimension Nt X N r = 14 X 5. (c): The efficiency of the migrations 
matrix. 
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Figure 20: Initial sampled hyper-box ("old"), and hyper-box with reduced volume ("new") corre- 
sponding to Sec. 6.5. 
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Figure 21: The 1-dimensional marginal distributions of p(T|D) in the example of Sec. 6.5. 




Figure 22: The unfolded spectrum of the example in Sec. 6.5. As Fig. 21 makes clear, Tt bins 
t = {1, 2, 3, 10, 11, 12, 13, 14} is unconstrained by the data, which means that the unfolded spectrum 
content in these bins is a mere reflection of the prior, namely of the sampled region. 
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Figure 23: The migrations matrix (left); data, reco and truth spectra (middle), where "Actual 
truth" is T, "MC truth" is T, and in this case T = T; and truth and unfolded spectra (right). 
Each row corresponds to a = {0, 50, 75, 100, 150}. Details in Sec. 6.6.1. 
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Figure 24: Same as Fig. 23, but with a narrower truth-level bump. Details are given in Sec. 6.6.1. 
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Figure 25: The migrations matrix (left); MC reco, MC truth T, actual truth T, and the data, 
which contain a bump unknown to the MC (middle); and truth (T) and unfolded spectra (right). 
Each row corresponds to a = {0, 50, 75, 100, 150}. Details in Sec. 6.6.2. 
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Figure 26: Same as Fig. 25, but with a narrower unexpected bump in the data. Details are given 
in Sec. 6.6.2. 
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Figure 27: The input data, truth-level, and reco-level spectrum used in the examples of Sec. 7.1.1. 
Since no smearing is modeled, the reco spectrum is identical to the truth, and since no unexpected 
processes are assumed, the actual truth T is identical to the MC truth spectrum T. The sampled 
region is shown, which is sampled with MCMC, without need for volume reduction. 
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Figure 28: The posterior P(5i(T)|D), for three different choices of the regularization parameter a, 
corresponding to Sec. 7.1.1. For a = 0, p(T|D) is unaffected by regularization. As the regularization 
constraint becomes stronger, the posterior p(T|D) is "pushed" towards T-points which give smaller 
Si(T). The sampling method is MCMC, and the small tails in (b) and (c) are reflecting the part 
of the MCMC random walk before it had reached the vicinity of the most likely T. 
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Figure 29: The result of unfolding of Sec. 7.1.1, with regularization function Si, for three values 
of a. The upper row (a,b,c) demonstrates that the bias of the unfolded spectrum increases with 
increasing regularization strength. The lower row (d,e,f) shows the relative uncertainty of the 

unfolded spectrum ( u r + ip )> which is expected to reduce with increasing a. There is a tiny 

reduction for a = 10 3 , at the cost of considerable bias. For a = 3 x 10 3 , the posterior is forced 
to be more constant, which maximizes entropy, but is limited by the upper edge of the sampled 
hyper-box, so, the apparent reduction of uncertainty is simply because the posterior is squeezed 
against that edge (see Fig. 35, 3rd row). 
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Figure 30: The posterior P(S < 2(T)|D), for three different choices of the regularization parameter 
a, corresponding to Sec. 7.1.1. 




Figure 31: The result of unfolding of Sec. 7.1.1, with regularization function 52, for three values of 
a. The first bins are more affected by this regularization. Very small improvement is observed in 
the uncertainty of the unfolded spectrum. 
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Figure 32: The posterior P(logi S3(T)|D), for three different choices of the regularization param- 
eter a, corresponding to Sec. 7.1.1. 
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Figure 33: The result of unfolding of Sec. 7.1.1, with regularization function S3, for three values of 



a. 
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Figure 34: The result of unfolding of Sec. 7.1.1, with a Gaussian regularization constraint, for three 
values of a (see Sec. 7). 
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Figure 35: Some 1-dimensional distributions from Sec. 7.1.1. The columns show Pf(Ti|D) with t = 
{1, 2, 9, 12, 14}. The rows correspond to regularization with (S, a) = {(Si, 0), (Si, 1 x 10 3 ), (Si, 3 x 
10 3 ),(S2,6 x 10- 4 ),(S , 3 ,20),(5 3 ,40)}, in this order. 
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Figure 36: Some 2-dimensional distributions from Sec. 7.1.1. The columns show Pi x ^(T^T^ID) 
with {tx^t-i) = {(1, 2), (3, 4), (7, 8), (9, 10), (12, 14)}. The rows correspond to regularization with 
(S, a) = {(5i,0),(5i,l x 10 3 ),(5i,3 x 10 3 ),(S 2 ,6 x lO" 4 ), (5 3 , 20), (5 3 , 40)}, in this order. 
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Figure 38: The result of unfolding of Sec. 7.1.2, with regularization function Si, for three values of 
a. 
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Figure 39: The posterior P(S < 2(T)|D), for three different choices of the regularization parameter 
a, corresponding to Sec. 7.1.2. 




Figure 40: The result of unfolding of Sec. 7.1.2, with regularization function 52, for three values of 
a. 
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Figure 41: The posterior P(S < 3(T)|D), for three different choices of the regularization parameter 
a, corresponding to Sec. 7.1.2. 
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Figure 42: The result of unfolding of Sec. 7.1.2, with regularization function S3, for three values of 
a. 
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Figure 44: From the posteriors computed in Sec. 7.1.2, using regularization with S3 and three values 
of a, twenty random T-points are sampled following each posterior, and are overlaid (colored lines) 
with the sampled hyper-box (gray region) and the actual truth-level spectrum (red histogram). In 
(a) the sampled T-points are not regularized, in (b) they are more constant in intervals of bins, and 
in (c) they tend to be even flatter, and at least two families of spectra seem to be favored, which 
have their flat regions in different ranges of bins. 
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gure 45: The result of unfolding of Sec. 7.1.2, with a Gaussian regularization constraint, for three 
values (see Sec. 7). 
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Figure 46: (a) The MC truth- level spectrum T (without the bump), the actual truth- level spectrum 
T (with the bump), the reconstructed spectrum which corresponds to T after smearing, the data 
which follow T after smearing, and the sampled hyper-box used in Sec. 7.2. (b) The migrations 
matrix, populated with the MC events that compose the MC truth level (T) and the reco spectrum 
of (a). 




Figure 47: The posterior P(Si(T)|D), for three different choices of the regularization parameter 
a, corresponding to Sec. 7.2. 
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Figure 48: The result of unfolding of Sec. 7.2, with regularization function Si, for three a values. 
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Figure 49: The posterior P(S < 2(T)|D), for three different choices of the regularization parameter 
a, corresponding to Sec. 7.2. 
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Figure 50: The result of unfolding of Sec. 7.2, with regularization function S2, for three a values. 
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Figure 51: The posterior P(S , 3(T)|D), for three different choices of the regularization parameter 
a, corresponding to Sec. 7.2. 
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Figure 52: The result of unfolding of Sec. 7.2, with regularization function S3, for three a values. 
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Figure 53: The result of unfolding of Sec. 7.2, with Gaussian regularization, for three a values. 
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Figure 54: (a) The MC truth-level spectrum T and the actual truth-level spectrum T, where T = T, 
the reconstructed spectrum which corresponds to T after smearing, the data which follow T after 
smearing, and the sampled hyper-box used in Sec. 7.3. (b) The migrations matrix, populated with 
the MC events that compose the MC truth level (T = T) and the reco spectrum of (a). 
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Figure 55: The posterior P(Si(T)\T)), for three different choices of the regularization parameter 
a, corresponding to Sec. 7.3. 



58 




10- 

1U 
2 
1-10 
■20 
30 



truth bin 



truth bin 



(a) a = 



(b) a = 10 3 



(c) a = 3 x 10 3 




1 °" Z >' J J J J s i J J idJJJJJJJJJgpJJJ&gZs 

truth bin 

(d) a = 




1 iejidKsaviddJdJd&J&d&f^iisiexs 
truth bin 

(e) a = 10 3 



1 0" 1 s J J J s J J J iJJJJJJJJdJgfi&J. 

truth bin 

(f) a = 3 x 10 3 



Figure 56: The result of unfolding of Sec. 7.3, with regularization function Si, for three a values. 
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Figure 57: The posterior P(S < 2(T)|D), for three different choices of the regularization parameter 
a, corresponding to Sec. 7.3. 
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Figure 58: The result of unfolding of Sec. 7.3, with regularization function S2, for three a values. 
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Figure 59: The posterior P(S , 3(T)|D), for three different choices of the regularization parameter 
a, corresponding to Sec. 7.3. 
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Figure 60: The result of unfolding of Sec. 7.3, with regularization function S3, for three a values. 
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Figure 61: The result of unfolding of Sec. 7.3, with Gaussian regularization, for three a values. 
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